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ABSTRACT 

The ability to subtract foreground contamination from low-frequency observations is crucial to reveal 
the underlying 21 cm signal. The traditional line-of-sight methods can deal with the removal of diffuse 
emission and unresolved point sources, but not bright point sources. In this paper, we introduce 
a foreground cleaning technique in Fourier space, which allows us to handle all such foregrounds 
simultaneously and thus sidestep any special treatments to bright point sources. Its feasibility is tested 
with a simulated data cube for the 21 CentiMeter Array experiment. This data cube includes more 
realistic models for the 21 cm signal, continuum foregrounds, detector noise and frequency-dependent 
instrumental response. We find that a combination of two weighting schemes can be used to protect 
the frequency coherence of foregrounds: the uniform weighting in the uv plane and the inverse- variance 
weighting in the spectral fitting. The visibility spectrum is therefore well approximated by a quartic 
polynomial along the line of sight. With this method, we demonstrate that the huge foreground 
contamination can be cleaned out effectively with residuals on the order of ~ 10 mK, while the 
spectrally smooth component of the cosmological signal is also removed, bringing about systematic 
underestimate in the extracted power spectrum primarily on large scales. 

Subject headings: cosmology: theory — diffuse radiation — intergalactic medium — methods: data 
analysis — radio lines: general — techniques: inter ferometric 



1. INTRODUCTION 

As a direct probe of the intergalactic medium (IGM), 
the 21 cm line emitted by neutral hydrogen will pro- 
vide rather tight constraints on the early phase of cos- 
mic structure formation. Simulations of the IGM evo- 
lution have shown that the 21 cm radiation from the 
epoch of reionization (EoR) has a strength of ~ 10 
mK, and i s expected to oscilla t e significantly with red- 
shift (e.g. IMatteo et aH 120021: iCiardi fe Madaul 120031: 
iMcQuinn et al. 1 120061 : Ueli et al. 112008ft . Low frequency 
interferometers like the Low Frequency Array (LOFAR), 
Giant Meterwave Radio Telescope (GMRT), Murchi- 
son Widefield Array (MWA), Precision Array to Probe 
Epoch of Reionization (PAPER), and 21 CentiMeter Ar- 
ray (21 CM A) will aim to seek statistical detections of this 
cosmological signal in the near future. Unfortunately, the 
redshifted 21 cm signal is swamped by a long list of con- 
taminants. The presence of Galactic and extragalactic 
foreground sources, which contribute a brightness tem- 
perature on the order of ~ 100 K at 100 MHz, does 
pose a serious challenge for the upcoming; observat ions 
(jShaver et aI7~lll999b [Furlanetto. Oh fc Briggs 112006ft . In 
this paper, we concentrate on the ability to subtract fore- 
grounds from radio interferometric measurements and 
further reveal the underlying 21 cm signal. 

Over the last decade, much effort has been made 
in exploring poss ible methodologies f o r foreground 
subtr acti on (e.g. IMatteo et al. 1 120021: lOh fc Mark I 
2003; Zaldarriaga, Furlanetto & Hernquist 120041: 
Furlanetto & Briggs 2004; Santos, Coorav & Knox 
2005; Wang et al. 2006; Morales, Bowman & Hewitt 
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Nusser fc Benson I 
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Liu. T egmark & Zaldarria ga! l2QQ9at Eiu et al. I 12009b : 
Hark er et al. 1 120101 : iLiu fc Tegmarkl 12011ft . The most 
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widely discussed proposal focused on the line-of-sight 
(LOS) technique, taking advantage of the foreground 
smoothness in frequency space. Owing to the "mode- 
mixing" effect, previous studies were confined to the 
removal of confusion- level contaminants (i.e. diffuse 
emission and unresolved point sources), assuming that 
the bright and resolved point sources have been cleaned 
out perfectly by other radio astronomy algorithms such 
as CLEAN or peeling. However, for the upcoming 
21 cm experiments, the subtraction of bright point 
sources with the re quired precision is still a p r oblem 
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2011 



dNoordam I 120041 iDatta. Bh atnagar fc Carilli 
iDatta . Bowman & Ca rilli I [20101 : iPindor et al. 
iBernardi et al. 112011ft . On one hand, the deconvolution 
of point sources is feasible in principle, but introduces 
some, and perhaps considerable, artifacts due to the 
lower dynamic range of most radio images. On the other 
hand, the prior sky models at low frequencies are as yet 
fairly unconstrained observationally, which need to be 
improved continually in future measurements. 

In this paper, we exploit the frequency coherence of 
continuum foregrounds in Fourier space. Our goal is to 
develop a blind foreground subtraction technique which 
allows us to sidestep issues associated with the prior exci- 
sion of bright point sources. We first simulate the 21 cm 
interferometric measurements, including the instrumen- 
tal effects of the frequency-dependent primary beam and 
uv sampling. Techniques are then explored to protect the 
foreground smoothness along the LOS. With these im- 
proved techniques, we show that the visibility spectrum 
emitted from the resolved and unresolved point sources 
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Fig. 1. — Brightness temperature images of the low-frequency foregrounds. The observing frequencies are 80 MHz (left panel) and 150 
MHz (right panel) respectively. Different from previous work, bright point sources are included in our foreground model. The apparent 
angular size of sources reflects the frequency-dependent size of the synthesized beam in 21 CM A observations. 



together with our Galaxy can be well approximated with 
a smooth function and hence cleaned out simultane- 
ously. While our simulations require specific realizations 
of the array layout, the proposed method is expected 
to be app licable for all the first generation EoR ex - 
periments. [Zaldarriaga, Furlanetto & Hernauist (2004), 
iLiu et al. I (|2009bf ) and lHarker et al. I (|2010f ) have fit- 
ted foregrounds as a function of frequency in Fourier 
space, b ut they dealt only with the unr esolved point 
sources. iGleser. Nusser fc Benson I ([2008) presented a 
de-contamination approach based on the maximum a- 
posteriori probability (MAP) formalism, taking into ac- 
count the bright point sources but not the instrumental 
response. 

The rest of this paper is organized as follows. In 
Section 2, we outline the proper simulations of the sky 
model, and introduce the simulated interferometric mea- 
surements. The detailed array parameters that have been 
used in the simulations are presented here. Our fore- 
ground removal technique is described carefully in Sec- 
tion 3. In addition, we measure the impact of foreground 
subtraction on the cosmological signal, and discuss how 
the residuals depend on the data reduction and antenna 
configuration. And in Section 4, we estimate the qual- 
ity and sensitivity of power spectrum extraction by us- 
ing the cleaned data cube. The method of suppressing 
the detector noise is also considered here. Finally, we 
discuss the implications of the results from our simula- 
tions, and present our recommendations for upcoming 
low- frequency experiments in Section 5. 

2. SKY MODEL AND RADIO INTERFEROMETRY 

2.1. Simulations of Low-frequency Sky 

In this section, we outline the large-volume simula- 
tions adopted in the current work, focusing on the astro- 
physical foregrounds and the expected 21 cm signal from 
reionization. All the simulations are performed over a 
field-of-view of 10° x 10°, and the resulting sky maps are 



arranged onto grids of 500 x 500 pixels. Along the LOS, 
we use the frequency range extending from 130 MHz to 
170 MHz with the resolution of 0.1 MHz. 

2.1.1. Foreground Sources 

To simulate the low- frequency foregrounds with high fi- 
delity, we employ M onte Carlo simulations presented by 
I Wang et al. I (|201Q[ ) and references therein, incorporating 
contributions from three main components: (1) Galac- 
tic synchrotron and free-free emission; (2)galaxy clus- 
ters; and (3)extragalactic discrete sources such as star- 
forming galaxies and AGNs. In order to construct the 
foreground model with higher spatial and spectral ac- 
curacies, they first adopt the generic property that ra- 
dio spectra of foregrounds follow power-law shapes with 
a running spectral index, and further consider in detail 
not only random variations of morphological and spectro- 
scopic parameters within the reasonable ranges allowed 
by multi- frequency observations, but also evolution of 
radio halos in galaxy clusters, assuming that relativist ic 
electrons are re-accelerated in the intra-cluster medium 
in merger events and lose energy via both synchrotron 
emission and inverse Compton scattering with cosmic 
microwave background (CMB) photons. Our foreground 
box is kindly provided by the authors. 

In some ways, the Galactic radio recombination lines 
(RRLs) can introduce significant structure in frequency 
space, but their narrow line widths (Au ~ 3 kHz at 100 
MHz) imply that they jus t occur at very narro w fre- 
quency bands. Moreover, iPetrovic fc Oh I (j2011f ) have 
proved that the integrated extragalactic radio recombi- 
nation line background is also unlikely to constitute a 
formidable foreground. The RRLs are therefore omitted 
in our analysis, since we can easily excise the contami- 
nated regions of the spectrum in future measurements. 
The simulated foreground maps as observed by 21 CM A 
are shown in Figure 1. Color versions of the figures are 
available in the online journal. 
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Fig. 2. — Slices through 21 cm brightness temperature box generated from 21cmFAST simulations, corresponding to (z,xm) 
(10.279, 0.753), (9.078, 0.573), (8. 167, 0.342), (6.893, 0.024) in a clockwise direction. 



2.1.2. 21 cm EoR Signal 

We carry out a publicly available code called 
21cmFAST to generate the expected 21 cm bright- 
ness temperature field ( Mesinger & Furl anettol |2007; 
iMesinger, Furlanetto fc Cenll2011[ ). For very large vol- 
umes, the semi- numerical approach has the advantage 
of properly and rapidly creating the signals with suffi- 
cient resolution. In what follows, we briefly summarize 
its scheme. The initial conditions in Lagrangian space 
are initialized at z = 300. And a Monte Carlo real- 
ization of the density field as well as velocity field are 
then established. Based on this, the non-linear gravita- 
tional effects are considered using the first-order pertur- 
bation theory as described by the Zel'dovich approxima- 
tion. In order to increase the speed and dynamic range, 
the 21cmFAST does not explicitly resolve source halos. 
Instead the excursion-set formalism is simply applied to 
estimate the mean densities around a given point within 
decreasing sizes, allowing us to obtain the collapsed mass 
field. The situation and evolution of the ionization fields 
are directly related to the density distribution. Com- 
bined with the velocity gradient and spin temperature, 
the predicted 21 cm si gnal from neutral hydrogen can be 
evaluated through (cf. iFurlanetto, Oh fc Briggsl 120061 ) 
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where dv r /dr is the comoving velocity gradient along the 
line of sight. As usually argued, fluctuations in the spin 
temperature introduce considerable contributions to the 
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Fig. 3. — Statistic on the locations of the 40 antenna pods dis- 
tributed from west to east. The number counts are plotted as 
histograms. 



EoR signal especially at higher redshifts. Refer to the 
original papers for more details. 

In our work, the 10° x 10° sky region corresponds to a 
comoving window on the order of 1000 Mpc x 1000 Mpc 
at EoR redshifts. The initial density field simulation 
therefore involves a 1000 3 Mpc 3 cosmological box with 
2000 3 cells, z.e.,~ 0.5 Mpc per pixel on a side. To keep a 
moderate computational time, we choose to smooth the 
evolved density field and velocity field into a 500 3 grid, 
and then generate the ionization field with the assump- 
tion of T s ^> T 7 at lower redshifts. 

Though the 21 cm signal is expected to be at least a 
factor of 10 4 smaller than foregrounds, including it in the 
sky model is necessary for distortion analysis. How does 
the cleaning process affect the cosmological signal? Since 
the purpose of this work is not only to develop the fore- 
ground removal technique but also to test its usefulness 
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in realistic measurements, a demanding model for the 21 
cm signal is important to draw our basic conclusions. In 
Figure 2, we plot the 21 cm brightness temperature maps 
at different redshifts from 21cmFAST simulation boxes. 

2.2. Radio Interferometry with 21 CM A 

We employ the West-East baseline of the 21 CM A to 
produce the specific uv sampling. There are 40 antenna 
pods located in this baseline, and the effective collecting 
area of a pod is 218 m 2 at 150 MHz. All the antennas 
point toward the North Celestial Pole (NCP), and hence 
continuously observe a fixed patch in the sky. We show 
the distribution of the 40 antenna pods in Figure 3. And 
the integration uv coverage is plotted in Figure 4. 

For a realistic interferometer, the fundamental observ- 
able is a set of complex visibilities, which can be defined 
as 

V„(u,v)= [ [ A v (l,m)I v (l,m)e- 2 ^ ul+vm) oil dm (2) 



in the flat-sky approximation. Here, I u (l,m) is the 
sky brightness distribution, and A v (l,m) describes the 
primary beam of an interferometer pair (i.e., normal- 
ized reception pattern). For simplicity, we will hence- 
forth use Il(l,m) to denote the modified sky bright- 
ness, A u (l, m)I v {l, m). In practice, the complex visibility 
can not be known everywhere, but only finite samples 
are measured on the uv plane (as shown in Figure 4). 
And the sampling process can be described by a sam- 
pling function S u (u, v), which is zero where no data have 
been taken. As a result, I v (l,m) itself can not be recov- 
ered directly, instead one obtains the so-called dirty map 
Tfi) , where 

J^(Z,m) = J J S v {u,v)Vl{u,v)e 2 ^ ul+vm) dudv, (3) 

and Vl(u,v) denotes the noise-corrupted visibilities. Us- 
ing the convolution theorem for Fourier transform, its 
relation to the desired intensity distribution I u (l,m) can 
be written as 

I?(l,m) = r v (l,m)*B v (l,m), (4) 
where the in-line asterisk means convolution, and 

B v (l,m)= 1 1 S v (u,v)e 2 ^ ul+vm) dudv (5) 



is the synthesized beam or point spread function (PSF) 
corresponding to the uv distribution of baselines. These 
equations indicate that the measured visibilities in real 
observations can be simulated as the modified intensity 
distribution V v corrupted by the telescope noise and then 
multiplied by the uv sampling function S v . 

In order to understand how the instrumental response 
impacts the foreground subtraction, our simulations are 
passed through the observational pipeline. We first es- 
tablish the original image cube consisting of the astro- 
physical foregrounds and the 21 cm signal in our fre- 
quency range. And the primary beam can be generally 
approximated by a Gaussian A u (0) = exp(— 2 /O^ ) with 
width Ob ~ 0.6A/D, where D is the physical size of an 
antenna pod. At each frequency channel, the sky image 
is multiplied by the primary beam, and in turn related 
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Fig. 4. — Density of visibility measurements in the uv plane at 
150 MHz with integration time of 24 hours. Only the 40 West-East 
pods of 21CMA are involved. 



to visibilities via the two dimensional Fourier transform. 
Subsequently, we estimate the noise visibilities with one 
year integration. The rms nois e per visibility per fre- 
quency channel can be given by dRohlfs fc Wilson ll200i 
iMcQuinn et al. II2006D 
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in which A e and are the effective area and the beam 
solid angle of an interferometer element respectively, Au 
is the bandwidth of a single frequency channel, and t 
is the total integration time for sampling a given (u, v) 
location. For 21 CM A, we assume the sky-dominated sys- 
tem temperature to be T sys « 440[(1 + z)/9] 2 6 K. Mean- 
while, we approximate the integration time t = tN(u, v), 
where r = 5 s is the accumulation duration for each vis- 
ibility measurement, and N(u, v) is the number of inde- 
pendent samples in that pixel dBharadwaj fc Ali 1 12005; 
iBowman, Morales fc HewitFI 120091 ). Because the ther- 
mal noise is random, we draw complex visibilities from 
Gaussian distributions with zero mean and rms described 
above. From Figure 4, one can infer that the detector 
noise would increase significantly toward the outer re- 
gions of the uv plane, owing to the sparse coverage. Fi- 
nally, the real-world sampling is accomplished carefully 
in the uv plane. The contribution of any single visibility 
measurement is applied to only one grid cell. We fur- 
ther normalize the baseline distribution to ensure that 
each pixel in the sampled part of the uv plane has the 
same weight. With the uniform weighting, we artificially 
emphasize the information contained in long baselines 
and increase the effective resolution of the derived sky 
maps. A natural weighting scheme should not be cho- 
sen, since the number of visibility measurements in a 
uv pixel changes with wavelength, inducing fluctuations 
along the frequency direction. 

Following all these simulations, we generate our visibil- 
ity cube representing actual measurements. In the next 
section, we will concentrate on how to remove the as- 



Foreground Subtraction in 21 cm Measurements 



5 




u i i i i i i i i i i i i i u 

130 132 134 136 138 140 142 144 



frequency (MHz) 

Fig. 5. — Residual spectra after foreground subtraction along the same line of sight. The orders of fitting polynomials are TV = 3,4,5 
from top to bottom. The thick solid line and dotted line are the input 21 cm signal and detector noise respectively. And the thin solid line 
shows the visibility spectrum in the cleaned data cube (after polynomial subtraction), including the residual 21 cm signal, detector noise 
and fitting errors. 



trophysical foregrounds and reveal the cosmological EoR 
signal by using the multi- frequency visibilities. 

3. FOREGROUND SUBTRACTION 

As mentioned above, foreground contamination seems 
formidable in low- frequency experiments, which exceeds 
the cosmological signal by at least four orders of magni- 
tude. Symmetry differences between the two are there- 
fore well-studied to separate them from each other. Since 
the EoR emission appears as bumps along both the fre- 
quency and angular directions, the redshifted 21 cm 
signal is expected to be spherically symmetric in 3D 
space (ignoring redshift space distortions), and fluctu- 
ate rapidly in all three dimensions. On the contrary, 
continuum foregrounds have strong fluctuations in the 
transverse direction across the sky but weak ones in the 
radial direction. 

In general, a traditional foreground subtraction 
strategy includes three steps: bright sources re- 
mova l, spectral fitting, and residual e rrors subtrac- 
tion (Morale s. Bowman fc Hewitt I |2QQ6[ ). In order to 
protect the frequency coherence of foregrounds, bright 
point sources have to be subtracted down to a 10- 
100 mJy threshold prior to the LOS spectral fit- 
ting step because the incomplete uv coverage of in- 
terferometer changes with observing frequency, and 
thus creates different sidelobe patterns across the 
sky maps, inducing the "mode- mixing" effect as em- 



phasized in Bowman, Morales & He witt I ([2009) and 
iLiu. Tegmark fe Z aldarriagal (|2009at ). We note that 
since the bright point sources have spectra with the 
same functional form of the unresolved point sources, 
the former itself will not destroy the frequency coher- 
ence. And the frequency decoherence seen in real space 
is caused only by the frequency-dependent telescope re- 
sponse. If we can accurately describe the change of 
the instrumental response, it will not limit the contin- 
uum subtraction any more. In Fourier space, one can 
easily identify pixels with different uv sampling, and 
hence employ an inverse-variance weighting scheme to 
describe their information content. In this instance, we 
automatically skip those empty frequency channels at 
which the points are not sampled, meanwhile, give higher 
signal-to-noise data points greater weights. Basically, 
the "frequency-skipping" effect protects the foreground 
smoothness along the LOS. As a result, the first two 
steps in the traditional strategy can be reduced to one: 
spectral fitting. 

We now apply our method to the simulated visibility 
data cube. The frequency range 130 ^ v ^ 170 MHz is 
divided into some sub-bands over which the wavelength 
varies by less than 10%. Foregrounds are then subtracted 
individually from each sub-band data. This operation 
offers two major advantages. Firstly, since the band- 
width of sub-band is really small compared to the observ- 
ing frequency, the primary beams increase slowly toward 
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130 132 134 136 138 140 142 144 

frequency (MHz) 

Fig. 6. — Foreground subtraction in a Fourier-space pixel from 
the part of the uv plane where the baseline coverage is complete. 
Top panel: Frequency spectrum along the line of sight. A fore- 
ground fit (solid line) to the visibility data points (open circles) is 
plotted, with the presence of the 21 cm signal and noise. Bottom 
panel: Residuals after the fitting polynomial is subtracted from the 
input data. The thick solid line and dotted line correspond to the 
original signal and noise, while the thin solid line represents the 
post-subtraction residuals. Clearly, the spectrally smooth compo- 
nent of the cosmological signal have been unavoidably removed due 
to the polynomial subtraction. However, fluctuations in the 21 cm 
signal are preserved well in the residual spectrum. 




130 132 134 136 138 140 142 144 



frequency (MHz) 

Fig. 7. — Same as Figure 6, but for the polynomial-subtraction 
technique applied to the part of the uv plane where the baseline 
coverage is sparse. Owing to the sparse uv coverage, the detec- 
tor noise increases significantly and hence dominates the residual 
visibilities. 



smaller frequencies and thus the contribution from point 
sources will be smooth function of frequency that can 
be accurately matched by the polynomial fit. Secondly, 
we can expect to extract the HI power spectrum through 
cross-correlating two sub-bands following which the ther- 
mal noise power spectrum do not have to be known. A 
detailed introduction about this will be given in the next 



section. 

For a given sub-band, there are mo frequency channels. 
Within one pixel, we let yi denote the measured visibility 
at frequency V{ with weight Wi, for i = 1, 2, • • • , mo- The 
weight Wi is proportional to the number of baselines that 
are binned into this uv pixel at the frequency V{ . Because 
of the "frequency-skipping" effect, the effective number 
of frequency channels used in polynomial fit is reduced 
to m. For our representative method, we directly fit yi 
with a set of basis functions 

N 

\g( yi ) = ^2a k T k (xi), (7) 

k=0 

in which X{ = lg(z^), and T k is the Chebyshev polynomi- 
als of the first kind. The coefficients a k have the property 
that they minimize <r, the sum of squares of the weighted 
residuals where 

ei = Wi[]g( yi ) - ft] (8) 

for i = l,2,'--,m. Here fa is the value of the poly- 
nomial at the ith channels. For the complex visibili- 
ties, the real and imaginary parts are fitted separately. 
Clearly, the key to approximating the visibility spec- 
trum lies in understanding the order TV of polynomial. 
If TV is too low, there are insufficient degrees of free- 
dom to remove the foregrounds efficiently; if TV is too 
high, some of the cosmological s ignal may be mistaken 
(jFurlanetto. Oh fc Briggsl l2QQ6f ). In fact, the Galac- 
tic foreground spectral index fluctuates as a function 
of both frequency and position. And for extragalactic 
foregrounds, the sum of power law spectra will not be 
a power law. Taking these foreground properties and 
instrumental response into account, we firstly treat TV 
as a free parameter, and fit the visibility spectra with 
different choices of TV along different LOSs. We find 
that this treatment is not helpful to reveal the faint EoR 
signal, and introduces structure in the final estimate of 
power spectrum. In Figure 5, we plot the polynomial- 
subtraction residuals along the same LOS but for three 
different orders. One can see that for TV = 3 (top panel), 
the residual visibility spectrum is dominated by a slowly- 
varying component, indicating that the foreground con- 
tamination is not removed effectively, and for TV = 4 
(middle panel) and TV = 5 (bottom panel), the residual 
spectra have similar shapes. As emphasized by other au- 
thors, one should perform the polynomial fit with order 
as low as possible. Here, the order TV is therefore set to 
be a constant TV = 4 for all pixels. 

Figure 6 shows the pristine frequency spectrum (top 
panel) as well as the post-subtraction residuals (bot- 
tom panel) in a uv pixel located near the origin. Just 
as we saw, the huge contaminants (open circles in top 
panel), including emissions from our Galaxy, galaxy clus- 
ters, unresolved point sources and bright point sources, 
can be well approximated by a smooth function (solid 
line in top panel) and effectively removed with obviously 
smaller residuals (thin solid line in bottom panel). The 
remaining data vary sufficiently rapidly with frequency, 
which means that the residuals are not dominated by 
foregrounds. Compared to the original signal and noise 
(thick solid line and dotted line in bottom panel), one 
may note that the spectrally smooth component of the 
cosmological signal has been accidentally removed during 
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Fig. 8. — Foreground subtraction technique applied to the simulated interferometric measurements. The top row gives two cuts for the 
input visibility cube, showing a w-map at a single frequency channel u = 138 MHz (left panel) and a slice along the frequency direction 
and i>-axis in the sub-band (right panel). The bottom row gives the same cuts, but for the cleaned visibility cube following foreground 
subtraction. 



foreground subtraction. Fortunately, the effect of fore- 
ground cleaning alter the input signal primarily on large 
scales, and the integral structural component survives 
from polynomial subtraction. In Figure 7, we present 
results for a typical uv pixel residing in part of the uv 
plane where the baseline coverage is sparse. We can see 
that it is important to skip the empty frequency channels 
during the weighted polynomial fit. 

In order to protect the spectrally smooth component of 
the cosmological signal, we try to fit the visibility spec- 
trum with lower frequency resolutions, for Au = 1,2,3 
MHz: these produced remainders of similar quality. We 
find that the destruction of large-scale signal would be 
common to all LOS foreground subtraction schemes. 

Figure 8 further illustrates the results of performing 
the proposed subtraction method on the simulated sky 
model with frequency-dependent instrumental response. 
Qualitatively, one can see that the spectral fitting in 
Fourier space is sufficient to remove the foreground con- 
tamination, and the residual visibilities have been sup- 
pressed to a level of order ~ 10 mK. This will permit a 
more simple procedure for foreground cleaning in which 
the prior removal of bright point sources has been omit- 
ted. 

Our method is not without disadvantages. In principle, 
the lower signal-to-noise in each sub-band will degrade 
the foreground fitting. We have conducted preliminary 
tests of this method, and found that at least tens of chan- 
nels are required to give comparable results. 



4. POWER SPECTRUM EXTRACTION 



After foreground subtraction, we are left with a resid- 
ual visibility cube containing 21 cm signal, thermal noise 
and fitting errors. Since we may be unable to compute 
the noise power spectrum accurately enough a priori for 
real experiments, it can not be simply assumed that 
the desired power spectrum are estimated by subtract- 
ing the noise power spectrum from the residual power 
spectrum. In practice, the possible approach is to ex- 
plore the independence of thermal noise from two dif- 
ferent epochs, and then average their mean power to 
zero b y cross-correlation, leaving only the thermal uncer- 
tainty ([Bowman. Morales fc Hewitt 1 12009: Har ker et al. I 
l2QlQh . In addition to the uncorrelation between signal 
and noise, fitting errors are also uncorrelated with noise 
during the cross-correlation, and the residual power spec- 
trum therefore contains only three components: signal, 
fitting errors and their cross-terms. As long as the fitting 
errors are sufficiently small, this cross-correlation imme- 
diately provide us with an estimate of the desired power 
spectrum. 

Using visibility cross-correlatio n, we define 
the angula r power spectrum jB harad wai fc Ali I 
20051: lAlL Bharad wai fc Chengalur I [20081 : 



Datta. Bowman fc Car illi 2010) 



C e = 



2a£ 27r , uM W(u) 



where |u|= \/u 2 + v 2 , a = 7r#5 2 /2 describes the effect of 
primary beam, W(u) and V u (u) are the natural weight- 
ing function and visibilities measured at frequency v re- 
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Fig. 9. — Angular power spectra of the input foregrounds (dashed-dotted lines), the 21 cm signal (solid lines), the thermal noise (dashed 
lines), and the extracted signal (filled circles) at two different redshifts. These power spectra are calculated for one frequency bin of width 
0.1 MHz from our simulated or cleaned data cubes. The error bars are at the la confidence level. 



spectively. We choose the frequency channels from differ- 
ent sub-bands over which the foreground contaminants 
are subtracted independently. Two general points are 
worth noting here. On one hand, the two frequencies v 
and v + Av are just slightly different, i.e. Avjv <C 1. 
The typical correlation separation of two frequency chan- 
nels is Av = 0.1 MHz. Thus, the frequency separation 
will cause a very small change in the primary beam, and 
any evolution of the signal can be neglected in our anal- 
ysis. On the other hand, accuracy of foreground cleaning 
diminishes toward the ends of each sub-band, since there 
are fewer neighboring channels for polynomial fit. As 
a result, only the central channels of sub-bands should 
be chosen to estimate the power spectrum. In principle, 
the cross-correlation will eliminate the thermal noise and 
preserve the persistent 21 cm signal. 

The error bars on the extracted power spectra reflect 
the statistical errors due to the detector noise as well as 
sample variance. We calculate the contribution from the 
noise in a Monte Carlo fashion by measuring the standard 
deviation of the independent realizations of the thermal 
noise. And the error from sample variance is estimated 
by C^/^/rnJ, where rri£ is the number of cells within an 
annulus near £. We also confirm the fact that cross- 
correlation no longer works well when the foregrounds are 
subtracted over the full bandwidth. In this case, we can 
not assume the fitting errors and noise are uncorrelated 



any more. 

In Figure 9, we represent the final estimates of angu- 
lar power spectra at two redshifts. The recovered 21 cm 
power spectrum (filled circles) appears to be accurate 
and has small errors at 500 < £ < 5000, while it lies 
systematically below the input one (solid line) on large 
scales. The error bars grow in size at large scales due to 
sample variance. It is clearly that the foreground clean- 
ing process causes suppression of power in the statistical 
measures. These biases can be correctly accounted for 
the distortion of the EoR signal, as we see in Fig. 6. As 
pointed out above, the cleaning process removes slowly- 
varying modes in the simulated data at the expense of 
attenuating the cosmological signal by accidently remov- 
ing its large-scale fluctuations. This trend continues as 
we move to the lower redshift slice (bottom panel). For 
the sake of comparison, the foreground power (dashed- 
dotted line) and the thermal noise power (dashed line) 
before subtraction and cross-correlation are also plotted 
here. Nearly 1 year is required for the thermal noise to be 
small so that we could expect the cosmological signal to 
be the dominant contribution to the angular power spec- 
trum. One can see that it becomes important to include 
the faint 21 cm signal in the cleaning step, especially 
for understanding the foreground subtraction technique 
more comprehensively and for testing its practicability. 
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5. DISCUSSION 

In this paper, we studied the foreground removal prob- 
lem for the upcoming EoR experiments. With simula- 
tions of the 21 cm inter ferometric measurements, we in- 
vestigated the effect of instrumental response on the fore- 
ground removal strategy, and further developed a trend 
removal technique in Fourier space. Different from pre- 
vious work, the proposed method treats the confusion- 
level contaminants and the bright point sources as equiv- 
alent, and then clean all such foregrounds simultaneously 
only using the LOS spectral fitting. We illustrated that 
this method allows us to avoid complications due to the 
special treatments to bright point sources, which are 
unavoidable for traditional foreground subtraction ap- 
proaches. The basic reason that it works so well is that 
the frequency dependence of the discrete uv sampling can 
be well described through the inverse- variance weighting 
scheme in Fourier space. 

For our representative method, there is no need to con- 
struct proper bright point sources models for cleaning. 
And its computational cost is in general modest. More- 
over, we do not discard any available data, increasing the 
level of signal-to- noise. Let us mention that the residu- 
als are less than one part in 10 6 of original sky model, 
as we see in Fig. 8, indicating that the visibilities mea- 
sured in observations should reach a high dynamic range 
> 10 6 : 1. Furthermore, one should keep it in mind that 
the required dynamic range of visibility will increase with 
observed foregrounds. For a field-of-view around NCP 
(the case of 21CMA), the strength of foregrounds may 
be little different from our sky model, and thus the dy- 
namic range of measured visibilities should be at least 
10 6 : 1 to detect the EoR signal. 

Although our results are quite encouraging for fore- 
ground subtraction, much work remains to be done in 
this regard. Systematic biases between the input signal 
and the recovered signal seem to be unavoidable in the 
LOS polynomial fit. We find that the process of fore- 
ground cleaning itself accidentally removes the smooth 
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component of the cosmological signal, and hence leads to 
suppression in the final estimate of angular power spec- 
trum on large scales. In the current work, these artifacts 
have been clearly presented. And the astrophysical and 
cosmological inferences will be significantly distorted, un- 
less the effect of foreground cleaning can be correctly 
understood. However, for real observations, the system- 
atic underestimate of the true power spectrum would be 
more difficult to be assessed accurately. Making this sort 
of correction will always be uncertain, since it depends 
generally not only on earlier data processing steps, such 
as instrumental calibration, but also on the statistical 
properties of the true 21 cm signal. We therefore do not 
pursue this estimate in the present work. Studying this 
in more detail in the context of specific experiment must 
be the subject of future work. And different techniques 
to remove the foregrounds should also be explored. 

Results of this paper further reassure us the astrophys- 
ical foregrounds are unlikely to be the main limiting fac- 
tor in the detection of the EoR information. If the spec- 
trally smooth component is negligible in the true cos- 
mological signal, our proposed method turns out to be 
feasible. And if not, the systematic biases due to fore- 
ground subtraction appear to be the most worrisome re- 
maining problem. Since the proposed method is based on 
the symmetry differences between foregrounds and cos- 
mic signal, the details of foreground and cosmic models 
should have no effect on our qualitative conclusions. 
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